{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bhmud6TUPIE1",
    "papermill": {
     "duration": 0.011558,
     "end_time": "2021-04-20T21:06:30.038601",
     "exception": false,
     "start_time": "2021-04-20T21:06:30.027043",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Using PGMPY in the Analysis of Impact of 401(k) on Net Financial Wealth\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_execution_state": "idle",
    "_uuid": "051d70d956493feee0c6d64651c6a088724dca2a",
    "id": "t1xb29BvPIE4",
    "papermill": {
     "duration": 25.408317,
     "end_time": "2021-04-20T21:06:55.456487",
     "exception": false,
     "start_time": "2021-04-20T21:06:30.048170",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install pgmpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "LWHBvyEdP5pI"
   },
   "outputs": [],
   "source": [
    "import networkx as nx\n",
    "import warnings\n",
    "warnings.simplefilter('ignore')\n",
    "import pylab as plt\n",
    "from pgmpy.models.BayesianModel import BayesianNetwork"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C9Uqph8wPIE7",
    "papermill": {
     "duration": 0.010708,
     "end_time": "2021-04-20T21:06:55.479480",
     "exception": false,
     "start_time": "2021-04-20T21:06:55.468772",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Graphs for 401(K) Analsyis\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "B2_fiQOEPIE7",
    "papermill": {
     "duration": 0.010554,
     "end_time": "2021-04-20T21:06:55.500684",
     "exception": false,
     "start_time": "2021-04-20T21:06:55.490130",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Here we have\n",
    " * $Y$ -- net financial assets;\n",
    " * $X$ -- worker characteristics (income, family size, other retirement plans; see lecture notes for details);\n",
    " * $F$ -- latent (unobserved) firm characteristics\n",
    " * $D$ -- 401(K) eligibility, deterimined by $F$ and $X$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-U6yQWBgPIE7",
    "papermill": {
     "duration": 0.010886,
     "end_time": "2021-04-20T21:06:55.522231",
     "exception": false,
     "start_time": "2021-04-20T21:06:55.511345",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**One graph (where F determines X):**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "begOa2wrY93O"
   },
   "outputs": [],
   "source": [
    "G = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                            ('X', 'D'),\n",
    "                            ('F', 'X'),\n",
    "                            ('F', 'D'),\n",
    "                            ('X', 'Y')],\n",
    "                    latents=['F'])\n",
    "nx.draw_planar(G, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xwy7BUBGPIE7",
    "papermill": {
     "duration": 0.011939,
     "end_time": "2021-04-20T21:06:56.517753",
     "exception": false,
     "start_time": "2021-04-20T21:06:56.505814",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**List minimal adjustment sets to identify causal effects $D \\to Y$**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WmA30w14PIE7",
    "papermill": {
     "duration": 0.960286,
     "end_time": "2021-04-20T21:06:56.493094",
     "exception": false,
     "start_time": "2021-04-20T21:06:55.532808",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from pgmpy.inference.CausalInference import CausalInference\n",
    "\n",
    "inference = CausalInference(BayesianNetwork(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "r0Y8t6Xpldqo"
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    print(inference.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1sN1LTGplguG"
   },
   "outputs": [],
   "source": [
    "inference.is_valid_backdoor_adjustment_set('D', 'Y', \"X\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gkQCVQpoPIE8",
    "papermill": {
     "duration": 0.012126,
     "end_time": "2021-04-20T21:06:56.600815",
     "exception": false,
     "start_time": "2021-04-20T21:06:56.588689",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**What is the underlying principle?**\n",
    "\n",
    "Here conditioning on X blocks backdoor paths from Y to D (Pearl).  Dagitty correctly finds X (and does many more correct decisions, when we consider more elaborate structures. Why do we want to consider more elaborate structures? The very empirical problem requires us to do so!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "SyCZsVL2PIE8",
    "papermill": {
     "duration": 0.012073,
     "end_time": "2021-04-20T21:06:56.625022",
     "exception": false,
     "start_time": "2021-04-20T21:06:56.612949",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**Another Graph (where $X$ determines $F$):**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "6l83sAd8PIE8",
    "papermill": {
     "duration": 0.380125,
     "end_time": "2021-04-20T21:06:57.017337",
     "exception": false,
     "start_time": "2021-04-20T21:06:56.637212",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "G2 = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                             ('X', 'D'),\n",
    "                             ('X', 'F'),\n",
    "                             ('F', 'D'),\n",
    "                             ('X', 'Y')],\n",
    "                     latents=['F'])\n",
    "nx.draw_planar(G2, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "DZ7-ktDa5Rs1"
   },
   "outputs": [],
   "source": [
    "inference2 = CausalInference(BayesianNetwork(G2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qXhEMpdXPIE9",
    "papermill": {
     "duration": 0.056051,
     "end_time": "2021-04-20T21:06:57.094387",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.038336",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    print(inference2.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "A6kFc2R1PIE9",
    "papermill": {
     "duration": 0.014464,
     "end_time": "2021-04-20T21:06:57.130426",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.115962",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**One more graph (encompassing the previous ones), where (F, X) are jointly determined by latent factors $A$.**\n",
    "\n",
    "We can allow in fact the whole triple (D, F, X) to be jointly determined by latent factors $A$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HJYKqepTPIE-",
    "papermill": {
     "duration": 0.014208,
     "end_time": "2021-04-20T21:06:57.159030",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.144822",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This is much more realistic graph to consider."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7O0MbTDCPIE-",
    "papermill": {
     "duration": 0.393034,
     "end_time": "2021-04-20T21:06:57.567429",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.174395",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "G3 = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                             ('X', 'D'),\n",
    "                             ('F', 'D'),\n",
    "                             ('A', 'F'),\n",
    "                             ('A', 'X'),\n",
    "                             ('A', 'D'),\n",
    "                             ('X', 'Y')],\n",
    "                     latents=['F'])\n",
    "\n",
    "nx.draw_planar(G3, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eHVC2QdO5pN5"
   },
   "outputs": [],
   "source": [
    "inference3 = CausalInference(BayesianNetwork(G3))\n",
    "try:\n",
    "    print(inference3.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Mjus09VjPIE-",
    "papermill": {
     "duration": 0.016255,
     "end_time": "2021-04-20T21:06:57.599705",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.583450",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Threat to Identification:\n",
    "\n",
    "What if $F$ also directly affects $Y$? (Note that there are no valid adjustment sets in this case.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NjqrWLhXPIE-",
    "papermill": {
     "duration": 0.40878,
     "end_time": "2021-04-20T21:06:58.024720",
     "exception": false,
     "start_time": "2021-04-20T21:06:57.615940",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "G4 = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                             ('X', 'D'),\n",
    "                             ('F', 'D'),\n",
    "                             ('A', 'F'),\n",
    "                             ('A', 'X'),\n",
    "                             ('A', 'D'),\n",
    "                             ('F', 'Y'),\n",
    "                             ('X', 'Y')],\n",
    "                     latents=['F', 'A'])\n",
    "\n",
    "nx.draw_planar(G4, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "BfQZaKqFPIE_",
    "papermill": {
     "duration": 0.046811,
     "end_time": "2021-04-20T21:06:58.089616",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.042805",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "inference4 = CausalInference(G4)\n",
    "try:\n",
    "    print(inference4.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1Vz8Ply-PIE_",
    "papermill": {
     "duration": 0.017351,
     "end_time": "2021-04-20T21:06:58.124909",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.107558",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Note that the error here! There is no valid adustment set (among observed variables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yF5-jrbKPIE_",
    "papermill": {
     "duration": 0.017324,
     "end_time": "2021-04-20T21:06:58.159612",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.142288",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "**How can F affect Y directly? Is it reasonable?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Tn6bYdP6PIE_",
    "papermill": {
     "duration": 0.017002,
     "end_time": "2021-04-20T21:06:58.193722",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.176720",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Introduce Match Amount $M$. The match amount is a potential important mediator (why mediator?). $M$ is not observed. Luckily, adjusting for $X$ still works if there is no arrow $F \\to M$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "L24XPvB8E7Mk"
   },
   "outputs": [],
   "source": [
    "G5 = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                             ('X', 'D'),\n",
    "                             ('F', 'D'),\n",
    "                             ('A', 'F'),\n",
    "                             ('A', 'X'),\n",
    "                             ('A', 'D'),\n",
    "                             ('D', 'M'),\n",
    "                             ('M', 'Y'),\n",
    "                             ('X', 'M'),\n",
    "                             ('X', 'Y')],\n",
    "                     latents=['F', 'A', 'M'])\n",
    "\n",
    "nx.draw_planar(G5, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "zeHCBCtZFObD"
   },
   "outputs": [],
   "source": [
    "inference5 = CausalInference(G5)\n",
    "try:\n",
    "    print(inference5.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "o9-rScGhPIFA",
    "papermill": {
     "duration": 0.019066,
     "end_time": "2021-04-20T21:06:58.635211",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.616145",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "If there is $F \\to M$ arrow, then adjusting for $X$ is not sufficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "IGI7piKFPIFA",
    "papermill": {
     "duration": 0.365274,
     "end_time": "2021-04-20T21:06:59.019538",
     "exception": false,
     "start_time": "2021-04-20T21:06:58.654264",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "G6 = BayesianNetwork(ebunch=[('D', 'Y'),\n",
    "                             ('X', 'D'),\n",
    "                             ('F', 'D'),\n",
    "                             ('A', 'F'),\n",
    "                             ('A', 'X'),\n",
    "                             ('A', 'D'),\n",
    "                             ('D', 'M'),\n",
    "                             ('F', 'M'),\n",
    "                             ('M', 'Y'),\n",
    "                             ('X', 'M'),\n",
    "                             ('X', 'Y')],\n",
    "                     latents=['F', 'A', 'M'])\n",
    "\n",
    "nx.draw_planar(G6, with_labels=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qNc9JP-fFdVn"
   },
   "outputs": [],
   "source": [
    "inference6 = CausalInference(G6)\n",
    "try:\n",
    "    print(inference6.get_all_backdoor_adjustment_sets('D', 'Y'))\n",
    "except ValueError as e:\n",
    "    print(e.args[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "X7dEsQCptm65"
   },
   "source": [
    "Again, note the error here. There is no valid adjustment set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oMDvjvSNPIFA",
    "papermill": {
     "duration": 0.020751,
     "end_time": "2021-04-20T21:06:59.062138",
     "exception": false,
     "start_time": "2021-04-20T21:06:59.041387",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    " # Question:\n",
    "\n",
    "Given the analysis above, do you find the adjustment for workers' characteristics a credible strategy to identify the causal (total effect) of 401 (k) elligibility on net financial wealth?\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 32.278732,
   "end_time": "2021-04-20T21:06:59.193141",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2021-04-20T21:06:26.914409",
   "version": "2.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
